298 research outputs found
Work-in-Progress: Quantized NNs as the Definitive solution for inference on low-power ARM MCUs?
High energy efficiency and low memory footprint are the key requirements for the deployment of deep learning based analytics on low-power microcontrollers. Here we present work-in-progress results with Q-bit Quantized Neural Networks (QNNs) deployed on a commercial Cortex-M7 class microcontroller by means of an extension to the ARM CMSIS-NN library. We show that i) for Q=4 and Q=2 low memory footprint QNNs can be deployed with an energy overhead of 30% and 36% respectively against the 8-bit CMSIS-NN due to the lack of quantization support in the ISA; ii) for Q=1 native instructions can be used, yielding an energy and latency reduction of 3c3.8
7 with respect to CMSIS-NN. Our initial results suggest that a small set of QNN-related specialized instructions could improve performance by as much as 7.5
7 for Q=4, 13.6
7 for Q=2 and 6.5
7 for binary NNs
Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers
The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagenet classification. Concerning the quantization policy search, the RL agent selects quantization policies that maximize the memory utilization. Given an MCU-class memory bound of 2 MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions quantized with a non-uniform function, which is not tailored for CPUs featuring integer-only arithmetic. This denotes the viability of uniform quantization, required for MCU deployments, for deep weights compression. When also limiting the activation memory budget to 512 kB, the best MobileNetV1 model scores up to 68.4% on Imagenet thanks to the found quantization policy, resulting to be 4% more accurate than the other 8-bit networks fitting the same memory constraints
Decadal Variability in the Northeast Pacific in a Physical-Ecosystem Model: Role of Mixed Layer Depth and Trophic Interactions
A basin-wide interdecadal change in both the physical state and the ecology of the North Pacific occurred near the end of 1976. Here we use a physical-ecosystem model to examine whether changes in the physical environment associated with the 1976-1977 transition influenced the lower trophic levels of the food web and if so by what means. The physical component is an ocean general circulation model, while the biological component contains 10 compartments: two phytoplankton, two zooplankton, two detritus pools, nitrate, ammonium, silicate, and carbon dioxide. The model is forced with observed atmospheric fields during 1960-1999. During spring, there is a similar to 40% reduction in plankton biomass in all four plankton groups during 1977-1988 relative to 1970-1976 in the central Gulf of Alaska (GOA). The epoch difference in plankton appears to be controlled by the mixed layer depth. Enhanced Ekman pumping after 1976 caused the halocline to shoal, and thus the mixed layer depth, which extends to the top of the halocline in late winter, did not penetrate as deep in the central GOA. As a result, more phytoplankton remained in the euphotic zone, and phytoplankton biomass began to increase earlier in the year after the 1976 transition. Zooplankton biomass also increased, but then grazing pressure led to a strong decrease in phytoplankton by April followed by a drop in zooplankton by May: Essentially, the mean seasonal cycle of plankton biomass was shifted earlier in the year. As the seasonal cycle progressed, the difference in plankton concentrations between epochs reversed sign again, leading to slightly greater zooplankton biomass during summer in the later epoch
Mixed-data-model heterogeneous compilation and OpenMP offloading
Heterogeneous computers combine a general-purpose host processor with domain-specific programmable many-core accelerators, uniting high versatility with high performance and energy efficiency. While the host manages ever-more application memory, accelerators are designed to work mainly on their local memory. This difference in addressed memory leads to a discrepancy between the optimal address width of the host and the accelerator. Today 64-bit host processors are commonplace, but few accelerators exceed 32-bit addressable local memory, a difference expected to increase with 128-bit hosts in the exascale era. Managing this discrepancy requires support for multiple data models in heterogeneous compilers. So far, compiler support for multiple data models has not been explored, which hampers the programmability of such systems and inhibits their adoption. In this work, we perform the first exploration of the feasibility and performance of implementing a mixed-data-mode heterogeneous system. To support this, we present and evaluate the first mixed-data-model compiler, supporting arbitrary address widths on host and accelerator. To hide the inherent complexity and to enable high programmer productivity, we implement transparent offloading on top of OpenMP. The proposed compiler techniques are implemented in LLVM and evaluated on a 64+32-bit heterogeneous SoC. Results on benchmarks from the PolyBench-ACC suite show that memory can be transparently shared between host and accelerator at overheads below 0.7 % compared to 32-bit-only execution, enabling mixed-data-model computers to execute at near-native performance
Growth variations and scattering mechanisms in metamorphic In0.75Ga0.25As/In-0.75 Al0.25As quantum wells grown by molecular beam epitaxy
Modulation doped metamorphic In0.75Ga0.25As/In0.75Al0.25As quantum wells (QW) were grown on GaAs substrates by molecular beam epitaxy (MBE) with step-graded buffer layers. The electron mobility of the QWs has been improved by varying the MBE growth conditions, including substrate temperature, arsenic over pressure and modulation doping level. By applying a bias voltage to SiO2 insulated gates, the electron density in the QW can be tuned from 1×1011 to 5.3×1011 cm−2. A peak mobility of 4.3×105 cm2V−1s−1 is obtained at 3.7×1011 cm−2 at 1.5 K before the onset of second subband population. To understand the evolution of mobility, transport data is fitted to a model that takes into account scattering from background impurities, modulation doping, alloy disorder and interface roughness. According to the fits, scattering from background impurities is dominant while that from alloy disorder becomes more significant at high carrier density
Low-frequency variability in the Gulf of Alaska from coarse and eddy-permitting ocean models
[1] An eddy-permitting ocean model of the northeast Pacific is used to examine the ocean adjustment to changing wind forcing in the Gulf of Alaska (GOA) at interannual-to-decadal timescales. It is found that the adjustment of the ocean model in the presence of mesoscale eddies is similar to that obtained with coarse-resolution models. Local Ekman pumping plays a key role in forcing pycnocline depth variability and, to a lesser degree, sea surface height (SSH) variability in the center of the Alaska gyre and in some areas of the eastern and northern GOA. Westward Rossby wave propagation is evident in the SSH field along some latitudes but is less noticeable in the pycnocline depth field. Differences between SSH and pycnocline depth are also found when considering their relationship with the local forcing and leading modes of climate variability in the northeast Pacific. In the central GOA pycnocline depth variations are more clearly related to changes in the local Ekman pumping than SSH. While SSH is marginally correlated with both Pacific Decadal Oscillation (PDO) and North Pacific Gyre Oscillation (NPGO) indices, the pycnocline depth evolution is primarily related to NPGO variability. The intensity of the mesoscale eddy field increases with increasing circulation strength. The eddy field is generally more energetic after the 1976–1977 climate regime shift, when the gyre circulation intensified. In the western basin, where eddies primarily originate from intrinsic instabilities of the flow, variations in eddy kinetic energy are statistically significant correlated with the PDO index, indicating that eddy statistics may be inferred, to some degree, from the characteristics of the large-scale flow
Multi-Color Imaging of Magnetic Co/Pt Multilayers
We demonstrate for the first time the realization of a spatial resolved two color, element-specific imaging experiment at the free-electron laser facility FERMI. Coherent imaging using Fourier transform holography was used to achieve direct real space access to the nanometer length scale of magnetic domains of Co/Pt heterostructures via the element-specific magnetic dichroism in the extreme ultraviolet spectral range. As a first step to implement this technique for studies of ultrafast phenomena we present the spatially resolved response of magnetic domains upon femtosecond laser excitation
Two-dimensional electron gas formation in undoped In[0.75]Ga[0.25]As/In[0.75]Al[0.25]As quantum wells
We report on the achievement of a two-dimensional electron gas in completely
undoped In[0.75]Al[0.25]As/In[0.75]Ga[0.25]As metamorphic quantum wells. Using
these structures we were able to reduce the carrier density, with respect to
reported values in similar modulation-doped structures. We found experimentally
that the electronic charge in the quantum well is likely due to a deep-level
donor state in the In[0.75]Al[0.25]As barrier band gap, whose energy lies
within the In[0.75]Ga[0.25]As/In[0.75]Al[0.25]As conduction band discontinuity.
This result is further confirmed through a Poisson-Schroedinger simulation of
the two-dimensional electron gas structure.Comment: 17 pages, 6 figures, to be published in J. Vac. Sci. Technol.
Seeded x-ray free-electron laser generating radiation with laser statistical properties
The invention of optical lasers led to a revolution in the field of optics
and even to the creation of completely new fields of research such as quantum
optics. The reason was their unique statistical and coherence properties. The
newly emerging, short-wavelength free-electron lasers (FELs) are sources of
very bright coherent extreme-ultraviolet (XUV) and x-ray radiation with pulse
durations on the order of femtoseconds, and are presently considered to be
laser sources at these energies. Most existing FELs are highly spatially
coherent but in spite of their name, they behave statistically as chaotic
sources. Here, we demonstrate experimentally, by combining Hanbury Brown and
Twiss (HBT) interferometry with spectral measurements that the seeded XUV FERMI
FEL-2 source does indeed behave statistically as a laser. The first steps have
been taken towards exploiting the first-order coherence of FELs, and the
present work opens the way to quantum optics experiments that strongly rely on
high-order statistical properties of the radiation.Comment: 24 pages, 10 figures, 37 reference
Neuraghe: Exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on zynQ SoCs
Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition. However, their computational load is significant, motivating the development of CNN-specialized accelerators. This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs. NEURAghe leverages the synergistic usage of Zynq ARM cores and of a powerful and flexible Convolution-Specific Processor deployed on the reconfigurable logic. The Convolution-Specific Processor embeds both a convolution engine and a programmable soft core, releasing the ARM processors from most of the supervision duties and allowing the accelerator to be controlled by software at an ultra-fine granularity. This methodology opens the way for cooperative heterogeneous computing: While the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of the computational graph, taking advantage of the NEON vector engines to further speed up computation. Through the companion NeuDNN SW stack, NEURAghe supports end-to-end CNN-based classification with a peak performance of 169GOps/s and an energy efficiency of 17GOps/W. Thanks to our heterogeneous computing model, our platform improves upon the state-of-the-art, achieving a frame rate of 5.5 frames per second (fps) on the end-to-end execution of VGG-16 and 6.6fps on ResNet-18
- …